perm filename PASTE[L70,TES] blob sn#017425 filedate 1972-12-14 generic text, type T, neo UTF8
COMMENT ⊗   VALID 00010 PAGES 
RECORD PAGE   DESCRIPTION
 00001 00001
 00002 00002	CHAPTERS AND PARTS OF CHAPTERS FOR PAPERS AND MANUALS
 00003 00003	OUTLINE:				     TUTORIAL	MANUAL-ONLY
 00005 00004	P-INT	INTRODUCTION
 00007 00005	P-RR	REWRITE RULES
 00009 00006	P-LIT	LITERALS
 00012 00007	P-RF	REWRITE FUNCTIONS
 00014 00008	P-PRIV	PRIVATE VARIABLES
 00018 00009	P-EL	ELLIPSIS
 00019 00010	P-PREC	PRECEDENCE OF RULES
 00022 ENDMK
⊗;
CHAPTERS AND PARTS OF CHAPTERS FOR PAPERS AND MANUALS

TUTORIAL ARTICLE ON "PATTERN DIRECTED STREAM PROCESSING"
	STRESS USES, IMPLEMENTATION, HISTORY
	CF. FORMAT DECLARATIONS, RPG, DECISION TABLES
	PROBABLY OMIT CAUSATION, MEXPRS, PUBLIC VARIABLES, OR PUT IN AFTERNOTE

A.I. CONFERENCE
	ASSUME KNOWLEDGE OF LISP, PATTERN MATCHING, AND BACKTRACKING
	STRESS ORGANIZATION OF DATA BASE INTO PARTIAL FUNCTIONS;
		IMPLICIT INVERSES; STREAMING; PRECEDENCE; EXTENSIBLE LANGUAGE

SIGPLAN
	STRESS EXTENSIBILITY AND MACHINE INDEPENDENCE
OUTLINE:				     TUTORIAL	MANUAL-ONLY

  P-	PATTERN DIRECTED COMPUTATION
		INT	INTRODUCTION
		RR	REWRITE RULES
		LIT	LITERALS	     ` ' ≡ ( )
		RF	REWRITE FUNCTIONS
		PRIV	PRIVATE VARIABLES	:
		EL	ELLIPSIS		..
		PREC	PRECEDENCE OF RULES	≤≥
		LISP	LISP SUBSET
		EVAL	EVALUATION		#
		APPL	APPLICATION		OMIT	@ {} *
		SEG	SEGMENTS	     :: ## ...	@@ ** $
		PRED	PREDICATION		?
		CONV	CONVERSION		<>

  M-	MLISP INTERFACE				OMIT
		MEXPR	MEXPRS IN PEXPRS		[]  /
		PEXPR	PEXPRS IN MEXPRS		{}
		PUBL	PUBLIC VARIABLES IN P		% %% %# %() %[] %#() %#[]

  E-	EXTENSION
		EXT	EXTENDING REWRITE FNS
		USES	USES FOR EXTENSIBLE FUNCTIONS

  C-	CAUSATION				OMIT
		SIDE	SIDE EFFECTS			&
		DEMON	INSTIGATING CAUSES		&

  I-	IMPLEMENTATION
		LR	TOP-DOWN LEFT-RIGHT
				BACK	BACKTRACKING; LEFT-RECURSION; →→
				FACT	FACTORING OF RULES
				ALTER	EXTENDING
				SD	SOURCE AND DESTINATION

		PEEK	LOOK AHEAD TECHNIQUES
		SPEC	SPECIAL CASE DETECTION
		PARA	PARALLEL COMPUTERS
P-INT	INTRODUCTION

↓_Pattern-directed computation_↓ involves two kinds of operations on
data structures: ∪decomposition and ∪recomposition.  Decomposition
breaks down an ↓_input stream_↓ into components under the direction
of a ↓_decomposing pattern_↓ ("dec").  The inverse operation,
recomposition, constructs an ↓_output stream_↓ under the direction of
a ↓_recomposing pattern_↓ ("rec").

The distinguishing characteristic of pattern-directed computation is
that it appears to be directed by a pattern rather than by an
algorithm, i.e., by a picture rather than by instructions.  Of course,
to implement it on a conventional computer, the pattern must be
translated to or interpreted by an algorithm.

Decomposition is sometimes called "pattern-matching", and its
interpreters are called "pattern-matchers".   See [MAD DOCTOR, ELIZA,
SNOBOL, COMIT, FLIP, PLANNER, QA-4, FLEX, MLISP-2, ...].

The abilty to use pictures to guide decomposition facilitates the
programming of table-driven syntactic parsers and structured-data
information retrieval systems.  Pattern-driven recomposition simplifies
the specification of code generators, output formats, and
structured-data transformations.
P-RR	REWRITE RULES

A ↓_rewrite rule_↓ is of the form:
.I
dec → rec
It defines a partial function on streams as follows: if the input stream
matches the dec, then the output stream is generated by the rec.

The following rewrite rule is part of the "square" function:
.I
5 → 25
If the input stream consists solely of the token `5' (a number), then the output
stream consists solely of the token `25'.

The following rewrite rule
is part of the "square root" function:
.I
25 → -5 5
If the input stream consists solely of the token `25', then the output
stream consists of the two tokens `-5' and `5'.

The next rule could
be part of a question-answering function:
.I
`HOW ARE YOU?' → `VERY WELL AND YOU?'
If the input stream consists of the four tokens:
.I
HOW ARE YOU ?
then the output stream consists of the five tokens:
.I
VERY WELL AND YOU ?
P-LIT	LITERALS

In the above examples, the input and output streams contained only
↓_atomic tokens_↓ such as numbers and words.  Sometimes, a token is a
non-atomic (structured) entity such as a ↓_list_↓.  A list will be represented
by a pair of parentheses enclosing its elements written in left-to-right
order.  Thus, the following rewrite rule is part of the "transpose" function:
.I
((A B C) (D E F)) → ((A D) (B E) (C F))
It takes an input stream containing the single token ((A#B#C)#(D#E#F)) and
generates an output stream containing the single token ((A#D)#(B#E)#(C#F)).
The lists in this case are meant to represent a 2x3 and a 3x2 matrix.

Different pattern-matchers employ a variety of notations to express similar
concepts.  In the notation of this paper, single quotes will be
used to surround atomic literals.  Several
atomic literals may appear within a single
pair of quotes.  Each literal contains either a
series of alphanumeric characters or a single non-alphanumeric character.

Some characters have
special meaning between single quotes:
.I
`	'	:	≡
To use such a character as a literal, it must be preceded by
a ≡ to override its special meaning.  Example:
.I
	`I SAY≡: WHAT≡'S UP?' → `NOT MUCH.'
This rule matches an input stream containing the eight tokens:
.I
I SAY : WHAT ' S UP ?
and generates the output stream:
.I
NOT MUCH .

The problems of separating a character-oriented input stream into tokens
and of editing superfluous spaces out of a character-oriented output stream
will be discussed later.  It is henceforth assumed that this has been done.
P-RF	REWRITE FUNCTIONS

A rewrite rule defines a partial function, for example, the mapping of
some particular token into some other particular token.  A broader
partial function can be defined as the union of several rewrite rules.
A ↓_rewrite function definition_↓ is of the form:
.B
<name>	= dec1 → rec1 ;
	= dec2 → rec2 ;
	.
	.
	.
	= decn → recn ;
.E
For example, a larger part of the "square function" than shown earlier, now
encompassing six cases, can be defined by:
.B
<square>= 0 → 0 ;
	= 1 → 1 ;
	= 2 → 4 ;
	= 3 → 9 ;
	= 4 → 16 ;
	= 10 → 100 ;
.E
Thus, square(1)=1 and square(10)=100, but square(6) is undefined.

The rules comprising a rewrite function definition may appear in any order.
Note that a partial inverse function often can be derived by interchanging the dec
and rec in every rule.  Thus, the above definition defines implicitly:
.I
<inverse square>= 0 → 0 ;
		= 1 → 1 ;
		= 4 → 2 ;
		= 9 → 3 ;
		= 16 → 4 ;
		= 100 → 10 ;
.E
P-PRIV	PRIVATE VARIABLES

A function is difficult to define if every case must be enumerated.
Therefore, pattern matchers allow ↓_variables_↓ to appear in patterns.
The value of a variable can be either a list or an atomic token.
In this paper, the notation:
.I
:X
where X is any identifier, will denote the ↓_private variable named X_↓.
The private variables of each rule are distinct from the private variables
of all other rules, even if their names are the same; hence the adjective
"private".

The following definition has only three rewrite rules but handles an
unlimited number of input streams:
.B
<reply>	= `HOW ARE YOU?'  →  `VERY WELL, AND YOU?'  ;
	= `HOW IS :X?'  →  `I HAVE NOT SEEN :X.' ;
	= `DID :X GO TO :Y?'  →  `WHY DON≡'T YOU ASK :X YOURSELF?' ;
.E
For example, reply(DID#JOHN#GO#TO#SCHOOL?) = WHY#DON'T#YOU#ASK#JOHN#YOURSELF?
and reply(DID#SAM#GO#TO#WORK?) = WHY#DON'T#YOU#ASK#SAM#YOURSELF?.

There are also an infinite number of input streams not handled by the above
definition.  For example, reply(SEE#YOU#LATER.) is undefined.  Thus, this
version of "reply" is still a partial function.  In fact, most rewrite
function definitions do define partial functions.  One exception is:
.I
<identity> = :Y → :Y ;

A private variable can appear more than once in a single dec pattern.  It
must match identical tokens at each appearance.  Example:
.I
<answer>= `WILL :X OR WON≡'T :X?'  →  `I DON≡'T KNOW.' ;
Thus, answer(WILL#ALICE#OR#WON'T#ALICE?) = I#DON'T#KNOW., but
answer(WILL#JANE#BUT#WON'T#FRED?) is undefined.

Systematic renaming of the private variables or any rewrite rule does not
change the meaning of the rule.  The following two rules are equivalent:
.B
:X (:X :Z)  →  (:Z :Z)
:YA (:YA :R) → (:R :R)
.E
If the input stream contains the two tokens:
.I
7   (7 (HELLO JOE))
then the atom 7 will be bound to X (or to YA) and the list (HELLO#JOE) will
be bound to Z (or to R).  The output stream from either rule will be:
.I
((HELLO JOE) (HELLO JOE))
P-EL	ELLIPSIS

To make patterns easier to read and write, the double-period ellipsis symbol
can be used to symbolize an unnamed private variable.  Thus, the following
two rules are equivalent:
.B
`IS :X COMING?'  →  `NO, :X COULD NOT MAKE IT.'
`IS .. COMING?'  →  `NO, .. COULD NOT MAKE IT.'
.E

If an ellipsis occurs several times in a pattern, it designates a
different variable each time.  The n'th double-period in the dec
designates the same variable as the n'th double-period in the rec.
The following two rules are equivalent:
.B
:X (:Y :Z) → (:X :Y)
.. (.. ..) → (.. ..)
.E
P-PREC	PRECEDENCE OF RULES

It is possible that a given input stream will match the decs of more than
one rule of a rewrite function.  For the time being, it will be assumed that
one and only one of those rules that match will be applied.  The decision as
to which one applies will be based on criteria outlined
below.  This assumption is essentially ↓_deterministic_↓.  If the pattern-matcher
were embedded in a non-deterministic system (implemented by backtracking or by
multiple processes), then more than one rule could be applied.  Such a situation
will be discussed later in this paper.

In many pattern matchers, the rules are ordered by the programmer, and the
first rule that matches is applied.  But in the pattern matcher of this paper,
the programmer may write rules in any order.  This makes it easier for him
to add a new rule to a rewrite function, because it is usually unnecessary to
study the rules that are already there.  However, it requires the pattern
matcher to utilize ↓_precedence criteria_↓ to impose an ordering on the rules.

The basic precedence criterion is that the rule with the most specific dec pattern
comes first.  For example, if part of the "reply" function is:
.B
<reply>	= `I SEE :X.'  →  `SO WHAT?' ;
	= `I SEE SUE.'  →  `WHERE?' ;
.E
then the input stream `I SEE SUE.' matches the decs of both rules.  However, the
second rule is more specific since it only applies to SUE whereas the first rule
applies to anyone.  Therefore, reply(I#SEE#SUE.) = WHERE?#.